Topic Modeling with Nonparametric Markov Tree
نویسندگان
چکیده
A new hierarchical tree-based topic model is developed, based on nonparametric Bayesian techniques. The model has two unique attributes: (i) a child node in the tree may have more than one parent, with the goal of eliminating redundant sub-topics deep in the tree; and (ii) parsimonious sub-topics are manifested, by removing redundant usage of words at multiple scales. The depth and width of the tree are unbounded within the prior, with a retrospective sampler employed to adaptively infer the appropriate tree size based upon the corpus under study. Excellent quantitative results are manifested on five standard data sets, and the inferred tree structure is also found to be highly interpretable.
منابع مشابه
Tree-Structured Stick Breaking for Hierarchical Data
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the component...
متن کاملReliability analysis of repairable systems using system dynamics modeling and simulation
Repairable standby system’s study and analysis is an important topic in reliability. Analytical techniques become very complicated and unrealistic especially for modern complex systems. There have been attempts in the literature to evolve more realistic techniques using simulation approach for reliability analysis of systems. This paper proposes a hybrid approach called as Markov system ...
متن کاملTREE-STRUCTURED STICK BREAKING PROCESSES FOR HIERARCHICAL DATA By Ryan P. Adams, Zoubin Ghahramani and Michael I. Jordan
Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the component...
متن کاملModeling corpora of timestamped documents using semisupervised nonparametric topic models
In this paper we propose a nonparametric topic model to capture the evolution of text over time. Mixture models for modeling text documents based on hierarchical Dirichlet processes (HDP) have been used successfully in recent work to provide a nonparametric prior for the number of topics in the corpus eliminating the need to specify apriori the number of topics. We extend this model to addition...
متن کاملOnline Adaptor Grammars with Hybrid Inference
Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions. This flexibility comes at the cost of expensive inference. We address the difficulty of inference through an online algorithm which uses a hybrid of Markov chain Monte Carlo and variational inference. We show that this inference strategy improves scalability without sacrif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning
دوره 2011 شماره
صفحات -
تاریخ انتشار 2011